Supervised Learning - Foundations Project: ReCell

Context

Buying and selling used phones and tablets used to be something that happened on a handful of online marketplace sites. But the used and refurbished device market has grown considerably over the past decade, and a new IDC (International Data Corporation) forecast predicts that the used phone market would be worth \$52.7bn by 2023 with a compound annual growth rate (CAGR) of 13.6% from 2018 to 2023. This growth can be attributed to an uptick in demand for used phones and tablets that offer considerable savings compared with new models.

Refurbished and used devices continue to provide cost-effective alternatives to both consumers and businesses that are looking to save money when purchasing one. There are plenty of other benefits associated with the used device market. Used and refurbished devices can be sold with warranties and can also be insured with proof of purchase. Third-party vendors/platforms, such as Verizon, Amazon, etc., provide attractive offers to customers for refurbished devices. Maximizing the longevity of devices through second-hand trade also reduces their environmental impact and helps in recycling and reducing waste. The impact of the COVID-19 outbreak may further boost this segment as consumers cut back on discretionary spending and buy phones and tablets only for immediate needs.

Objective

The rising potential of this comparatively under-the-radar market fuels the need for an ML-based solution to develop a dynamic pricing strategy for used and refurbished devices. ReCell, a startup aiming to tap the potential in this market, has hired you as a data scientist. They want you to analyze the data provided and build a linear regression model to predict the price of a used phone/tablet and identify factors that significantly influence it.

Data Description

The data contains the different attributes of used/refurbished phones and tablets. The data was collected in the year 2021. The detailed data dictionary is given below.

Data Dictionary

Importing necessary libraries and data

Data Overview

Observations:

Observations:

Observations:

Observations:

Exploratory Data Analysis (EDA)

Brand Name

Observations:

Operating System

Observations:

Screen Size

Observations:

4G or 5G

Observations:

Main Camera MP

Observations:

Selfie Camera MP

Observations:

Internal Memory

Observations:

Ram

Observations:

Battery

Observations:

Weight

Observations:

Release Year

Observations:

Days Used

Observations:

Normalized New Price

Observations:

Normalized Used Price

Observations:

Q. What does the distribution of normalized used device prices look like?

Observations:

Observations:

Q. What percentage of the used device market is dominated by Android devices?

Observations:

Q. The amount of RAM is important for the smooth functioning of a device. How does the amount of RAM vary with the brand?

Observations:

Q. A large battery often increases a device's weight, making it feel uncomfortable in the hands. How does the weight vary for phones and tablets offering large batteries (more than 4500 mAh)?

Observations:

Q. Bigger screens are desirable for entertainment purposes as they offer a better viewing experience. How many phones and tablets are available across different brands with a screen size larger than 6 inches?

Observations:

Q. A lot of devices nowadays offer great selfie cameras, allowing us to capture our favorite moments with loved ones. What is the distribution of devices offering greater than 8MP selfie cameras across brands?

Observations:

Q. Which attributes are highly correlated with the normalized price of a used device?

Observations:

Data Preprocessing

Observations:

Observations:

Observations:

Observations:

EDA

Observations:

Building a Linear Regression model

Split Data

Split X and y into train and test sets in a 70:30 ratio.

Fit Linear Model

Interpretation of R-squared

Model performance evaluation

Let's check the VIF of the predictors

Let's drop 'battery' since it made no effect on the adjusted R-squared.

Let's check if multicollinearity is still present in the data.

Since there is a very small effect (0.002) on adj. R-squared after dropping the 'release_year' column, we can remove it from the training set.

Let's check to see if multicollinearity is still present.

Multicollinearity is still present in our data, and hence, we should drop the 'screen_size' column as well.

Observations:

Linear Regression Assumptions

TEST FOR LINEARITY AND INDEPENDENCE

Observations

TEST FOR NORMALITY

Observations

Observations

Observations

TEST FOR HOMOSCEDASTICITY

The null and alternate hypotheses of the goldfeldquandt test are as follows:

Final Model Summary

All the assumptions of linear regression are now satisfied. Let's check the summary of our final model (olsres_16).

Observations

Let's print the linear regression equation.

We can now use the model for making predictions on the test data.

Actionable Insights and Recommendations

1) Normalized new price has the largest effect on the used price of the phone. If a company wishes to offer a higher priced used phone, they should seek out phones that were more highly priced when sold new.

2) This data does not offer information on return on investment or profitability. Further research should be done to consider profits of used phones.

2) Phones with commmon operating systems: Android, iOS, and Windows should be preferred over others. Others have a negative impact on the normalized used price.

3) 4g and 5g phones should be preferred over others. They have a greater impact on the normalized used price.

4) A unit increase in the normalized new price will result in a 0.4221 unit increase in the phone's normalized used price, all other variables remaining constant.

5) If a phone has 4g, it will result in a 0.0936 unit increase in the phone's normalized used price, all other variables remaining constant.

6) If a phone has 5g, it will result in a 0.0662 unit increase in the phone's normalized used price, all other variables remaining constant.

7) If a phone has any operating system other than Windows, iOS, or Android, it will result in a 0.1767 decrease in the phone's used normalized price.